NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

RT-Bench: A Long Overdue Update

Nicolella, Mattia; Hoornaert, Denis; Mancuso, Renato (July 2025, 19th Workshop on Operating Systems Platforms for Embedded Real-Time Applications (OSPERT 2025))

RT-Bench is a framework and community project that aims to establish a unified set of benchmarks with a homogeneous launch and result reporting interface, and with a simple build system. RT-Bench targets academic researchers and industry practitioners interested in understanding the performance characteristics of embedded/real-time systems when tested over realistic use-case applications. To facilitate real-time systems research, RT-Bench is designed from the ground up to include a set of fundamental capabilities such as periodic execution, selectable OS scheduler, and native and multi-architecture performance counters support, to name a few. RT-Bench has undergone continuous improvements and extensions. This paper reviews the most recent additions and features of the framework. Most prominently, these include heap migration, synchronized benchmark release, and experimental support for multi-threaded applications. This contribution includes a tutorial session with template benchmarks to showcase the new features and illustrate the process of integrating new benchmark suites.
more » « less
Free, publicly-accessible full text available July 7, 2026
UltraScale+ SpinalHDL Wrapper: Streamlining Ideas to Bitstream on UltraScale+ platforms

Hoornaert, Denis; Corradi, Giulio; Mancuso, Renato; Caccamo, Marco (July 2025, 19th Workshop on Operating Systems Platforms for Embedded Real-Time Applications (OSPERT 2025))

In an embedded computing landscape that inexorably leans into heterogeneity, System-on-Chips (SoCs) featuring tightly integrated Field Programmable Gate Arrays (FPGA) are bound to proliferate. In particular, such architectures’ high degree of flexibility and control caters well to the real-time\ community. Despite the appeal, real-time research exploiting HW/SW co-design on such architectures has remained tepid. While the usual suspects, such as the complexity of Hardware Description Languages, can be blamed, recent advancements in tooling (e.g., languages, frameworks) have proven efficient in easing the design of FPGA-located accelerators. However, in the context of SoC with FPGA platforms, these solutions fall short of addressing the next hurdle: integrating the custom accelerators with the rest of the SoC, which requires the tedious implementation of various supporting software resources. This article presents the first iteration of the UltraScale+ SpinalHDL Wrapper; a SpinalHDL library dedicated to supporting HW/SW co-design on SoC with FPGA platforms. The support ranges from assisting during the design of accelerators to automatically inferring and generating ready-to-use software support, such as Linux Kernel modules and Vivado deployment scripts.
more » « less
Free, publicly-accessible full text available July 7, 2026
Flight Testing Instrumentation Development and Integration for a Subscale Integrated High Lift Propulsor Testbed

https://doi.org/10.2514/6.2025-2416

Dantsker, Or D; Mancuso, Renato; Ward, Byron; Miller, Christian; Shea, Donovan F; Collins, Kyle (January 2025, American Institute of Aeronautics and Astronautics)

Full Text Available
Coherence-Aided Memory Bandwidth Regulation

https://doi.org/10.1109/RTSS62706.2024.00035

Izhbirdeev, Ivan; Hoornaert, Denis; Chen, Weifan; Zuepke, Alexander; Hammad, Youssef; Caccamo, Marco; Mancuso, Renato (December 2024, IEEE)

Full Text Available
Unified Local-Cloud Decision-Making via Reinforcement Learning

https://doi.org/10.1007/978-3-031-72940-9_11

Sengupta, Kathakoli; Shangguan, Zhongkai; Bharadwaj, Sandesh; Arora, Sanjay; Ohn-Bar, Eshed; Mancuso, Renato (November 2024, Springer Nature Switzerland)

Full Text Available
The Omnivisor: A Real-Time Static Partitioning Hypervisor Extension for Heterogeneous Core Virtualization over MPSoCs

https://doi.org/10.4230/LIPIcs.ECRTS.2024.7

Ottaviano, Daniele; Ciraolo, Francesco; Mancuso, Renato; Cinque, Marcello (July 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Pellizzoni, Rodolfo (Ed.)
Following the needs of industrial applications, virtualization has emerged as one of the most effective approaches for the consolidation of mixed-criticality systems while meeting tight constraints in terms of space, weight, power, and cost (SWaP-C). In embedded platforms with homogeneous processors, a wealth of works have proposed designs and techniques to enforce spatio-temporal isolation by leveraging well-understood virtualization support. Unfortunately, achieving the same goal on heterogeneous MultiProcessor Systems-on-Chip (MPSoCs) has been largely overlooked. Modern hypervisors are designed to operate exclusively on main cores, with little or no consideration given to other co-processors within the system, such as small microcontroller-level CPUs or soft-cores deployed on programmable logic (FPGA). Typically, hypervisors consider co-processors as I/O devices allocated to virtual machines that run on primary cores, yielding full control and responsibility over them. Nevertheless, inadequate management of these resources can lead to spatio-temporal isolation issues within the system. In this paper, we propose the Omnivisor model as a paradigm for the holistic management of heterogeneous platforms. The model generalizes the features of real-time static partitioning hypervisors to enable the execution of virtual machines on processors with different Instruction Set Architectures (ISAs) within the same MPSoC. Moreover, the Omnivisor ensures temporal and spatial isolation between virtual machines by integrating and leveraging a variety of hardware and software protection mechanisms. The presented approach not only expands the scope of virtualization in MPSoCs but also enhances the overall system reliability and real-time performance for mixed-criticality applications. A full open-source reference implementation of the Omnivisor based on the Jailhouse hypervisor is provided, targeting ARM real-time processing units and RISC-V soft-cores on FPGA. Experimental results on real hardware show the benefits of the solution, including enabling the seamless launch of virtual machines on different ISAs and extending spatial/temporal isolation to heterogenous cores with enhanced regulation policies.
more » « less
Full Text Available
Shared Resource Contention in MCUs: A Reality Check and the Quest for Timeliness

https://doi.org/10.4230/LIPIcs.ECRTS.2024.5

Oliveira, Daniel; Chen, Weifan; Pinto, Sandro; Mancuso, Renato (July 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Pellizzoni, Rodolfo (Ed.)
Microcontrollers (MCUs) are steadily embracing multi-core technology to meet growing performance demands. This trend marks a shift from their traditionally simple, deterministic designs to more complex and inherently less predictable architectures. While shared resource contention is well-studied in mid to high-end embedded systems, the emergence of multi-core architectures in MCUs introduces unique challenges and characteristics that existing research has not fully explored. In this paper, we conduct an in-depth investigation of both mainstream and next-generation MCU-based platforms, aiming to identify the sources of contention on systems typically lacking these problems. We empirically demonstrate substantial contention effects across different MCU architectures (i.e., from single- to multi-core configurations), highlighting significant application slowdowns. Notably, we observe that slowdowns can reach several orders of magnitude, with the most extreme cases showing up to a 3800x (times, not percent) increase in execution time. To address these issues, we propose and evaluate muTPArtc, a novel mechanism designed for Timely Progress Assessment (TPA) and TPA-based runtime control specifically tailored to MCUs. muTPArtc is an MCU-specialized TPA-based mechanism that leverages hardware facilities widely available in commercial off-the-shelf MCUs (i.e., hardware breakpoints and cycle counters) to successfully monitor applications' progress, detect, and mitigate timing violations. Our results demonstrate that muTPArtc effectively manages performance degradation due to interference, requiring only minimal modifications to the build pipeline and no changes to the source code of the target application, while incurring minor overheads.
more » « less
Full Text Available
Mcti: mixed-criticality task-based isolation

https://doi.org/10.1007/s11241-024-09425-5

Hoornaert, Denis; Ghaemi, Golsana; Bastoni, Andrea; Mancuso, Renato; Caccamo, Marco; Corradi, Giulio (July 2024, Real-Time Systems)

Abstract The ever-increasing demand for high performance in the time-critical, low-power embedded domain drives the adoption of powerful but unpredictable, heterogeneous Systems-on-Chip. On these platforms, the main source of unpredictability—the shared memory subsystem—has been widely studied, and several approaches to mitigate undesired effects have been proposed over the years. Among them, performance-counter-based regulation methods have proved particularly successful. Unfortunately, such regulation methods require precise knowledge of each task’s memory consumption and cannot be extended to isolate mixed-criticality tasks running on the same core as the regulation budget is shared. Moreover, the desirable combination of these methodologies with well-known time-isolation techniques—such as server-based reservations—is still an uncharted territory and lacks a precise characterization of possible benefits and limitations. Recognizing the importance of such consolidation for designing predictable real-time systems, we introduce MCTI (Mixed-Criticality Task-based Isolation) as a first initial step in this direction. MCTI is a hardware/software co-design architecture that aims to improve both CPU and memory isolations among tasks with different criticalities even when they share the same CPU. In order to ascertain the correct behavior and distill the benefits of MCTI, we implemented and tested the proposed prototype architecture on a widely available off-the-shelf platform. The evaluation of our prototype shows that (1) MCTI helps shield critical tasks from concurrent non-critical tasks sharing the same memory budget, with only a limited increase in response time being observed, and (2) critical tasks running under memory stress exhibit an average response time close to that achieved when running without memory stress.
more » « less
MemPol: polling-based microsecond-scale per-core memory bandwidth regulation

https://doi.org/10.1007/s11241-024-09422-8

Zuepke, Alexander; Bastoni, Andrea; Chen, Weifan; Caccamo, Marco; Mancuso, Renato (June 2024, Real-Time Systems)

Abstract In today’s multiprocessor systems-on-a-chip, the shared memory subsystem is a known source of temporal interference. The problem causes logically independent cores to affect each other’s performance, leading to pessimistic worst-case execution time analysis. Memory regulation via throttling is one of the most practical techniques to mitigate interference. Traditional regulation schemes rely on a combination of timer and performance counter interrupts to be delivered and processed on the same cores running real-time workload. Unfortunately, to prevent excessive overhead, regulation can only be enforced at a millisecond-scale granularity. In this work, we present a novel regulation mechanism fromoutside the coresthat monitors performance counters for the application core’s activity in main memory at a microsecond scale. The approach is fully transparent to the applications on the cores, and can be implemented using widely available on-chip debug facilities. The presented mechanism also allows more complex composition of metrics to enact load-aware regulation. For instance, it allows redistributing unused bandwidth between cores while keeping the overall memory bandwidth of all cores below a given threshold. We implement our approach on a host of embedded platforms and conduct an in-depth evaluation on the Xilinx Zynq UltraScale+ ZCU102, NXP i.MX8M and NXP S32G2 platforms using the San Diego Vision Benchmark Suite.
more » « less
Effortless Locality on Data Systems Using Relational Fabric

https://doi.org/10.1109/TKDE.2024.3386827

Papon, Tarikul Islam; Mun, Ju Hyoung; Karatsenidis, Konstantinos; Roozkhosh, Shahin; Hoornaert, Denis; Sanaullah, Ahmed; Drepper, Ulrich; Mancuso, Renato; Athanassoulis, Manos (January 2024, IEEE Transactions on Knowledge and Data Engineering)

A key design decision for data systems is whether they follow the row-store or the column-store paradigm. The former supports transactional workloads, while the latter is better for analytical queries. This decision has a significant impact on the entire data system architecture. The multiple-decadelong journey of these two designs has led to a new family of hybrid transactional/analytical processing (HTAP) architectures. Several efforts have been proposed to reap the benefits of both worlds by proposing systems that maintain multiple copies of data (in different physical layouts) and convert them into the desired layout as required. Due to data duplication, the additional necessary bookkeeping, and the cost of converting data between different layouts, these systems compromise between efficient analytics and data freshness. We depart from existing designs by proposing a radically new approach. We ask the question: “What if we could access any layout and ship only the relevant data through the memory hierarchy by transparently converting rows to (arbitrary groups of) columns?” To achieve this functionality, we capitalize on the reinvigorated trend of hardware specialization (that has been accelerated due to the tapering of Moore's law) to propose Relational Fabric, a near-data vertical partitioner that allows memory or storage components to perform on-the-fly transparent data transformation. By exposing an intuitive API, Relational Fabric pushes vertical partitioning to the hardware, which profoundly impacts the process of designing and building data systems. (A) There is no need for data duplication and layout conversion, making HTAP systems viable using a single layout. (B) It simplifies the memory and storage manager that needs to maintain and update a single data layout. (C) It reduces unnecessary data movement through the memory hierarchy, allowing for better hardware utilization and, ultimately, better performance. In this paper, we present Relational Fabric for both memory and storage. We present our initial results on Relational Fabric for in-memory systems and discuss the challenges of building this hardware and the opportunities it brings for simplicity and innovation in the data system software stack, including physical design, query optimization, query evaluation, and concurrency control.
more » « less
Full Text Available

« Prev Next »

Search for: All records